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Abstract. Still and multi-media images are subject to transformations for compres- 
sion, steganographic embedding and digital watermarking. In a major program of 
activities we are engaged in the modeling, design and analysis of digital content. 
Statistical and pattern classification techniques should be combined with understand- 
ing of run length, transform coding techniques, and also encryption techniques. 
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1 Introduction 

Steganography is the art and science of secret communication, aiming to conceal the existence 
of the communication. Steganalysis is the art of seeing unseen. With advent of computers, 
hiding information inside digital carriers, especially multi media files like audio (.wav) files, im- 
&ges(.bmp, .pnm, .jpg), is becoming popular( [1 5J). Digital images are most common sources 
for hiding message. The process of hiding information is called embedding. Least Significant 
Bit (LSB) embedding is the most widely used steganographic technique. In LSB embedding, 
the LSB of uncompressed images are replaced with the message bits. The amount of embed- 
ding (the number of bits embedded) referred to as level, is given as a percentage of the total 
number of pixels. 

Some of the powerful methods for the analysis of steganographic images are [6], [7], [2]. We 
propose new measures and techniques for detection and analysis of steganographic embed- 
ded content. We show that both statistical and pattern classification techniques using our 
proposed measures provide reasonable discrimination schemes for detecting embeddings of 
different levels. Our measures are based on a few statistical properties of bit strings and 
wavelet coefficients of image pixels. 

In Section EI we explain our approach towards classification of given data based on a feature 
vector consisting of statistical measures and using Support Vector Machine (SVM) tools. In 
Section [3] we propose the use of wavelet transforms for steganalysis. Our results presented 
in Section [2] and [3] show the efficacy of our measures in discriminating different levels of 
embedding. We conclude with our plans for improved and finer steganalysis in section |4] 

This paper is extension of our paper [4]. In this paper we provide analysis of proposed 
measure based on wavelets in [4]. For more detailed report, readers are referred to [3] 



2 Classification Of The Given Data Using Statistics And SVM 
2.1 Classification Of Different Files 

Steganography is one kind of transformation of one data into other with the help of a cover. 
So, if messages before embedding can be classified, we believe that we can classify them in 
transformed domain as well. Now assume, we can classify the data in transformed domain 
into different classes. What next? Why do we need classification in first place? Classification 



is required because, if we classify given part of data, we can narrow down the possible trans- 
formed space and we can predict what to except next and what should not be next. This will 
help us to predict message bits. 

Can we really classify data? We use a statistical feature space. We propose a vector of sta- 
tistical measures [6] for this purpose. Our feature vector consists of nine statistical measures. 
Thus, /jGM 9 . The measures are as follows. 

Hi : Weighted sum of the range of k-gram frequencies. Let f(k,j) denote the overlapping 
frequency of the k-gram binary pattern of the integer j in Si. For example /(4, 3) = Number 
of occurrences of the patterns < 0011 > in Si. For a 32 bit word W, we define 

4 

Hi(W) = V(max(/(fc, j)) - min(/(fc, j)))2 4 ( fc+1 > 

We expect the measure fj,\ to be smaller for random strings as compared to non-random 
strings. 

fi2 '■ Weighted sum of run lengths. Let the vector < ?i, Z2, ... > denote the sequence of run 
lengths of 0's and l's in a word W. Then we define, 

M2 (W) = 

where q are specifically chosen weights. We set Cj = 1 V i, without loss of generality. For 
random strings, we expect the measure ^2 to be smaller compared to non-random strings, 
since one expects very few long runs. 

jU3 : Weighted sum of byte-wise hamming weight transition. Let W = < bo, &i,&2)&3 >, 
where bj's are the bytes of the 32 bit word. Let #1(6) denote the number of l's in a 8 bit 
byte. Then we define, 

fi 3 (W) = 2 #1(fc()) +2 #1(b ° ebl) +2 #1(blffife) +2 #1(b2 ® fe3) 

For random strings W, we expect H${W) to be higher than non-random strings. It is also 
possible to define the measure ^3 with respect to overlapping bytes in a word, to measure the 
smoothness/suddenness of transitions. 

fi4 : Fourier transform of the autocorrelation function of the sequence bits in W. Let W 
= < ao,...,a3i > be a 32 bit word. The autocorrelation function A(W) is the sequence 
A(W) =< co,..,C3i > where q = Yl^=o a j- a j+i (mod 32), i = 0, ..,31 and the multiplication 
operation is over F2 vectors. The discrete Fourier transform F(A(W)) is given by the sequence 
F{A(W)) =< / ,...,/ 3 i >; where f k = J2f= Cj ^ jkmod32 k = 0,...,31. Here lo is a 32 nd root 
of unity. Finally, the measure ^(W) is a root mean square average of F and is given by, 

MW0 = (El//) 1/2 

3=0 

For random string W, we expect Hi{W) to be smaller than non-random strings. 

jU5 : Weighted Hadamard transform. Using an 8x8 Hadamard matrix (H) and operation 
y = Hx, where x is 8x1 bit vector, we get measure ^5. x is single data byte. Especially when, 



Hadamard transform is applied on images, x is pixel value. 



fie, /i7, fj,%, fig : These measures are based on weighted entropy measure ^Pilogpi where 
Pi's are probabilities of 1,2,3,4 grams. The weights are chosen experimentally to amplify the 
range. 

SVM (Support Vector Machine) is a powerful tool for pattern classification. With intro- 
duction of kernel tricks in SVM, it has become a very popular in machine learning community. 
In some cases, the given data is not directly classifiable. Such cases can be solved by trans- 
forming the given data to higher dimensional space in such a way that in transformed domain, 
the classification is much easier. Kernel tricks help this without actually transforming features 
to higher dimensional space. The above statistics i.e. fi is used as feature vector of the data. 
For training of SVMs, we measure statistics on 2000 words (8000bytes) of 30 different files to 
get 30 different \x and same for each class. For testing, we measure statistics on 2000 words 
of 20 different files from each class. Though we have used measures calculated on 2000 words, 
experiments shows that 400 words are sufficient for testing a data for classification. SVM tool 
is taken from http://www.csie.ntu.edu.tw/~cjlin/libsvm/. We are using the most widely used 
'Gaussian kernel' for SVM. For avoiding some features dominating in classification, we scale 
\x to zero mean, unit variance. We use the following 8 different classes: 

1. jpeg 2. bmp/pnm 3. zip files 4. gz files 5. text files 6. ps files 7. pdf files and 8. c 
files. 

We present the result in confusion matrix format. ij th entry is probability of data belonging 
to class i and getting classified as class j. Table Q] shows the result. It can be seen that as 
needed, ii th entry is very near to 1 for most of the classes. 

Table 1. Confusion Matrix For Data Classification 
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We used a total of 180 files for testing and were able classify with 82.22% accuracy. 



2.2 Analysis of LSB Planes From Stegoed and non-Stegoed Images 

In above experiments, we measure statistics on whole sequence of bits of the given data. An 
embedding operation is performed on LSBs of an image. So to capture perturbation due to 
steganographic operation, we measure statistics only of LSBs of images in pursuit of our aim, 
detecting levels of embedding in an given image. In this direction, we first considered only two 
classes of LSBs, one is LSB planes obtained from non-stegoed image and other LSB planes 
obtained from images with 50% embedding. The same \i defined above is measured on LSB 
planes of 30 images from both classes, (total 180 = 30*3(colors/images)*2 classes). Out of 



these, 150 are used for training SVM and testing was performed on 30. Here, we have two 
classes 

1. LSB plane of non-Stegoed image. 2. LSB plane of stegoed image. 
We present the results in confusion matrix in Table [2] 

Table 2. Confusion Matrix For 2 Category LSB Classification 
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Overall accuracy is 85%. Being motivated by the results, we now consider this as 4 category 
classification problem. The different classes defined are, 

1. LSB plane of non-Stegoed image. 2. LSB plane of 25% stegoed image. 3. LSB plane 
of 50% Stegoed image. 4. LSB plane of 75% stegoed image. 

The confusion matrix for this experiment is in Table Ofl 



Table 3. Confusion Matrix For 4 Category LSB Classification 
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The overall efficiency is 65%. Thus, this experiment alone is not sufficient for detection of 
levels of embedding. Hence we take another alternative approach. 

3 Analysis Of Images Using Wavelet Transforms 

Our feature vector \x considers a linear sequence of bits as input. However, image properties 
are in general captured more accurately by two dimensional transforms. Our goal is to classify 
images accurately under different levels of embedding. The approaches in Section 12.11 and 12.21 
serve as good handles in this direction. 

To further enhance our understanding of the effects of embedding, we study the behavior 
of wavelet coefficients. Farid et al |5|8j have shown that wavelet domain can capture image 
characteristics, such as whether an image is a natural image or a computer generated one 
or is a scanned one. They have shown that the feature vector given by them can be used 
for universal steganalysis. Their aim was only to find whether an image contains any kind of 
hidden information or not. We further explore the detection of the level of embedding. 

3.1 Hypothesis 

Our motivation to study the wavelet domain, rather than pixels directly, is that the averages 
in wavelet coefficients smoothen the pixel values and hence it is expected that even minor 



anomalies in neighboring pixels introduced by stego operation would lead to amplified changes 
in the wavelet domain. We intend to capture and attempt to calibrate these changes w.r.t 
graded embedding. We consider second level wavelet sub-bands of images. The Haar wavelet 
is used as the mother wavelet. 

3.2 Notations 

For our experiments, we use 15 images that do not contain any hidden information. These are 
images taken with a Nikon Coolpix camera at full 8M resolution with most of images stored 
in RAW format. These images are then cropped to get 800x600 images without doing any 
image processing operations. Let, 

I = {Ij : j = 0,1,2, ... , 14} be the set of natural unstegoed images. 
k : The initial LSB embedding present in an given image, i.e. k% LSB's 
of an image have been modified by steganographic operations. 
Sk : The Start Image, that is an image G I with k% embedding. 
(k is the unknown to be detected.) 
i : The forced embedding level, 
(will be defined in Section 13. 3p 
Ski '■ An image € I with fc% original embedding and i% forced embedding. 

3.3 Our Approach 

Let Sk be the given image. We call this as the start image. We do additional embedding 
on it to get Ski and refer this kind of embedding as ' forced embedding'' '. Our approach is to 
compute some transforms on both Sk and Ski, an d study a measure of the difference between 
the transform coefficients for finding k. This procedure is explained with the help of Fig. [U 
In Fig. [1] the transform used is the wavelet transform. 

3.4 Definitions 

We consider second level wavelet sub-bands. So, each 4*4 block in images will contribute to 
exactly one wavelet coefficient in each sub-band viz. LL, LH, HL, HH. Let, We consider the 
2 nd level LL sub-band coefficients, since most of the energy gets concentrated in this sub-band. 
Let a 4 * 4 subblock of an image be denoted by : 

/ a b c d\ 

e / 9 h 

i j k I 
\mn o p J 

The 2 nd level LL wavelet coefficient is given by 

\*(a + b + c + d + e + f + g + h 
+i + j + k + l + m + n-\-o + p) 

(Note : The 2 nd Level LL sub-band size is 4 of the original image size in both directions.) 




Get 'r|' from difference count 



Fig. 1. The process to get rj 

The LH coefficient is given by 

^*{(a + b + e + f + +i + j + m + n)-(c + d + g + h + k + l + o + p)} 

The HL coefficient is given by 

\*{{a + b + c + d + e + f + g + h) 
— (i + j + k + l + m + n + o + p)} 

The HH coefficient is given by 

\*{(a + b + e + f + k + l + o + p)- 
(c + d + g + h + i + j + m + n)} 

Let the image be considered as made up of 4 * 4 blocks. 

Let P denote a 4 * 4 block in P = (uij) 
and P' denote corresponding 4*4 block in Ski- P' = { u ij) 



We define the following random variables, 



Xq = #{| ( u ij ~ u ij) \¥ L : for all non-overlapping blocks P in Sk} 

X\ = | { u ij ~ u ij) I over non-overlapping blocks P in Sk 

X2 = { u ij ~ u ij) over non-overlapping blocks P in Sk 
X * 500 
^ image size in pixels 
r$ = SNR between 2 nd level LL sub-band of S ki and S k 



We have chosen the factor 500 to normalize the quantity r\ to be near 100 for the size 
images being considered (800 * 600). 



3.5 Analysis 

Let, 

p = probability of LSB of pixel in a 4 * 4block be even i.e. '0' in Cover S 

p' = probability of LSB of pixel in the 4 * 4block be even i.e. '0' in S^ 

k 

= g + ( X ~ k ) *P 

p" = probability of LSB of pixel in the 4 * 4block be even i.e. '0' in Ski 

Pr = probability of a particular 2 nd level LL wavelet coefficient in Ski is different 
from corresponding wavelet coefficient in Sk 

Image Size in Pixels _ 

X = — * Pr 

4*4 

X oc Pr 

Let, 

Vki = V with k% initial embedding and i% forced embedding. 



Theorem : 

i. rjki increases with i and decreases slightly with k. 

ii. rfy increase with increase in k. 

Proof: Observe that, 



rj ki oc X oc Pr 



Pr = l-prob{|£(«tf-«y 1= 0} 

= 1 — prob{No pixel in the particular 4*4 
block has been replaced with data bits 
OR 

2 pixels have been replaced with data bits 
in such way that one pixel value 
increases by 1 and other decreases by 1 
OR 

4 pixels have been replaced with data bits 
in such way that two pixel value 
increases by 1 and other decreases by 1 



OR 

16 pixels have been replaced by data bits 
in such way that for 8 pixels the value 
increases by 1 and other decreases by 1} 
l-{(l-i/2)« 

2' 

+(1 - i/2) 14 * (i/2) 2 * WC 2 *p' *{l-p')* 



1! * 1! 



+ (1 - i/2) 12 * (i/2) 4 * 16C 4 * p' 2 * (1 - p'f * ^ 
+(1 - i/2) 10 * (i/2) 6 * 16C 6 * p' 3 * (1 - p'f * ^ 

+(i/2) w *16C w *p' s *(l-p') 8 *^ (2) 

It is logically correct that r]ki increases with i, Also it can be seen from equation [2] for 
Pr, that rjki increases with i for < i < 1, This can be proved by differentiating equation [2] 
w.r.t. i or can be empirically verified with ease. A close look at the equation reveals that Pr 
depends upon (p' * (1 - p '))( Some P° sitive inte s er P° wer ) . Given a Start image S k , p' is fixed. But 
which in turn depends upon p and k. (Refer to Eq. [T]). As k increases to values of 1, p' goes 
to ^ irrespective of p. In general, adjacent pixels are very similar in natural images. So, p is 
biased towards 0.35 or 0.65. 



p' * (1 — p') increases as k increases, 
=^ Pr decreases as k increases. 
=^ rjki decreases as k increases. 

Thus, as Pr decreases with increasing k, the number of wavelet coefficients of and S^i that 
are equal, increases, i.e. noise in W(Ski) w.r.t W(Sk) decreases. 
=> rfy increases as k increases. 

We have verified these experimentally as follows. 




Fig. 2. Graph of rjki vs 'i' for various 'k' Hide^PGP 



3.6 Results 

We use the stego algorithm Hide4PGP in our experiments. In our experiments, we use i = 
10, ... , 100. k = 0, 10, 20, 30, 40, 50. The plots of rjki vs. i for various k is as shown in Fig. [2] 
For a particular forced embedding say i, it can be observed that rjki decreases as k increases. 
Encouraged by this monotonic trend, we now look closely at the variations in measure rj at 
a fixed forced embedding of i = 20%, with respect to k on different start images. The results 
are shown in Fig. [3l 

The continuous line shows the average value, rjk^o vs. k. The other curves show the rj 
values for the individual images. These also show the monotonic decreasing trend around the 
average value. We note that such trends are quite significant especially at low levels of 20% 
embedding. Thus, this serves as a first indicator for detecting approximately the amount of 
embedding (even at low levels) in any given image. 

It is quite difficult to conduct a large number of data generation experiments under various 
parameter choices using a public domain tool as we do not get appropriate handles into 
the source code. Hence, in our lab we have built a tool called CSA-Tool for simulating the 
behavior of S-Tool. We have taken care to incorporate our own functions for encryption, 
randomized location generation and embedding analogous to the steps performed by S-Tools. 
The statistical characteristics of our tools would closely resemble those of S-Tools. 
We performed similar experiments as detailed above using the CSA tool. FigSJand Fig|5]show 
the results. We note that the results are along the same trends as for Hide4PGP. However, 
the separations in FigfJ] are smaller than in Fig. [2] and fluctuations in Fig|5] are more than in 
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Fig. 3. Graph of rj vs 'k' for various at fixed forced embedding 20% for various images 
Hide4PGP 

Fig. [3l A reason is that the CSA Tool (and S-Tools) employs more strong random generators 
for choosing the LSB for embedding than the tool Hide4PGP. 

The plot, iyp vs. i for various k is as shown in Fig. [6l As per the theorem proved in 
Section 13.51 it can be observed that for a particular forced embedding say i, r$ increases 
as k increases. The zoomed version of Fig. EJ for i = 70 is shown in Fig. [7l Encouraged by 
this monotonic trend, we now look closely at the variations in measure i^p at a fixed forced 
embedding of i = 70%, with respect to k on different start images. The results are shown in 
Fig. 13.61 The continuous line shows the average value, vs. k. The other curves show the 
r^J° vs. k values for the individual images. These also show the monotonic decreasing trend 
around the average value. 

4 Conclusion 

We discussed two of our approaches towards analysis of stego images for detection of levels of 
embedding. Our approach of using wavelet coefficient perturbations holds promise. We plan 
to use this measure in addition to a statistical measures to arrive at finer detection in future. 
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Fig. 7. Graph of vs. 'i ' for various 'k ' CSA Tool - zoomed version 
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